# I/O Devices & SSD

# **System Architecture**

- Hierarchical approach
- Memory bus
  - CPU and memory
  - Fastest
- I/O bus
  - e.g., PCI
  - Graphics and higherperformance I/O devices
- Peripheral bus
  - SCSI, SATA, or USB
  - Connect many slowest devices



### **Device Structure**

#### Hardware Interface

- Allows the system software to control its operation
- status register: can be read to see the current status of the device
- command register: can be written to tell the device to perform a certain task
- data register: transfer data to/from the device

#### Internal Structure

- Implementation specific
- Simple devices
  - have one or a few hardware chips to implement their functionality;
- Complex devices
  - include a simple CPU running firmware, some general purpose memory, and other device-specific chips



### **Protocol**

 By reading and writing internal registers of a device, the operating system can control device behavior.

- Inefficiencies and Inconveniences
  - Polling
    - Repeatedly reading the status register
  - Programmed I/O (PIO)
    - The main CPU is involved with the data movement

## **Lowering CPU Overhead With Interrupts**

#### Interrupt

- OS can issue a request, put the calling process to sleep, and context switch to another task.
- When the device is finally finished with the operation, it will raise a hardware interrupt, causing the CPU to jump into the OS at a pre-determined interrupt service routine (ISR)
- The handler in operating system code that will finish the request
  - e.g., reading data and perhaps an error code from the device
- Wake the process waiting for the I/O
- Allow for overlap of computation and I/O
- Interrupt is not always the best solution
  - If a device is fast, it may be best to poll.
  - Otherwise, use interrupt

# **Interrupt Optimization**

### Unknown device: hybrid approach

 polls for a little while and then, if the device is not yet finished, uses interrupts. (two-phased approach)

#### Network systems

- When a huge stream of incoming packets each generate an interrupt, it is possible for the OS to livelock
  - only processing interrupts and never allowing a user-level process to run
- Occasionally use polling to better control what is happening in the system (e.g., slashdot effect)

### Interrupt Coalescing

- Multiple interrupts are coalesced into a single interrupt delivery
  - lower the overhead of interrupt processing
- Long waiting
  - Many interrupts can be coalesced
  - Increase the latency of a request

### More Efficient Data Movement With DMA

- OS programs the DMA engine
  - Source/destination address in memory or device
  - Data amount
- Other processes can be serviced
- When the DMA is complete, the DMA controller raises an interrupt, and the OS thus knows the transfer is complete.



### **Methods Of Device Interaction**

#### I/O instructions

- E.g., in and out on x86
- To send data to a device, register → port.
- Usually privileged instructions

#### Memory-mapped I/O

- Device registers are mapped to memory address space
- load (to read) or store (to write)

### **Device Driver**

- How can we keep most of the OS device-neutral, thus hiding the details of device interactions from major OS subsystems?
- Solution: Abstraction
  - The device driver in the OS know in detail how a device works
  - Any specifics of device interaction are encapsulated within D/D



Issues block requests to the generic block layer

Routes block requests to the appropriate D/D

handles the details of issuing the specific request

File System Stack

### **Device Driver**

- Drawbacks of encapsulation
  - Even if a device has many special capabilities, it has to present a generic interface to the rest of the kernel, those special capabilities will go unused.
  - e.g., SCSI devices which have very rich error reporting
- Over 70% of OS code is found in device drivers
  - for any given installation, most of that code may not be active
  - have many more bugs and thus are a primary contributor to kernel crashes

Dongkun Shin, SKKU  $1\mathrm{C}$ 

# **Disk System**



# **Disk System**

- Data access in disk system
  - Seek time
  - Rotational delay (latency time)
  - Data transmission time
  - 3. Disk rotation until sector has passed under the head:Data transfer time (< 1 ms)</li>
- 2. Disk rotation until the desired sector arrives under the head:
  Rotational latency (0-10s ms)



## **Disk System**

#### Average time to access some target sector

- $-T_{access} = T_{avg-seek} + T_{avg-rotation} + T_{avg-transfer}$ 
  - Seek time (T<sub>avg-seek</sub>)
    - Time to position heads over cylinder containing target sector
    - Typical T<sub>avq-seek</sub> is 3∼9ms
  - Rotational latency (T<sub>avg-rotation</sub>)
    - Time waiting for first bit of target sector to pass under head
    - $-T_{avg-rotation} = 1/2 \times 1/RPMs \times 60s/1min$
    - Typical  $T_{avq-rotation} = 1 \sim 4ms$
  - Transfer time (T<sub>avg-transfer</sub>)
    - Time to read the bits in the target sector
    - $-T_{avg-transfer} = 1/RPM \times 1/(avg. # sectors/track) \times 60s/1min$

### Flash-based SSD

### NAND flash memory

- Page: read/write unit, 4KB~16KB
- Block: erase unit, erase-before-write,
   128 KB or 256 KB
- Limited lifetime: wear out
- SLC, MLC, TLC





## Flash Operations

### Read (a page)

| D 1                                           |        | Kead | Program   | Erase     |
|-----------------------------------------------|--------|------|-----------|-----------|
| <ul><li>Random access</li></ul>               | Device | (μs) | $(\mu s)$ | $(\mu s)$ |
| Erase (a block)                               | SLC    | 25   | 200-300   | 1500-2000 |
|                                               | MLC    | 50   | 600-900   | ~3000     |
| <ul><li>set each bit to the value 1</li></ul> | TLC    | ~75  | ~900-1350 | ~4500     |

Dood

### Program (a page)

change some of the 1's within a page to 0's

| Page 1   | Page 2                                                                  | Page 3                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                                               |
|----------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 11001110 | 0000001                                                                 | 00111111                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                               |
| VALID    | VALID                                                                   | VALID                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                               |
| _        | _                                                                       | _                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                               |
| Page 1   | Page 2                                                                  | Page 3                                                                                                                                                                                                                          | _ <                                                                                                                                                                                                                                                                                                                                           |
| 11111111 | 11111111                                                                | 11111111                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                               |
| ERASED   | ERASED                                                                  | ERASED                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                                               |
|          |                                                                         |                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                               |
| Page 1   | Page 2                                                                  | Page 3                                                                                                                                                                                                                          | _ <                                                                                                                                                                                                                                                                                                                                           |
| 11111111 | 11111111                                                                | 11111111                                                                                                                                                                                                                        | ] [                                                                                                                                                                                                                                                                                                                                           |
| ERASED   | ERASED                                                                  | ERASED                                                                                                                                                                                                                          | _                                                                                                                                                                                                                                                                                                                                             |
|          | 11001110<br>VALID<br>Page 1<br>11111111<br>ERASED<br>Page 1<br>11111111 | 11001110         00000001           VALID         VALID           Page 1         Page 2           11111111         11111111           ERASED         ERASED           Page 1         Page 2           11111111         11111111 | 11001110         00000001         00111111           VALID         VALID         VALID           Page 1         Page 2         Page 3           11111111         11111111         11111111           ERASED         ERASED         ERASED           Page 1         Page 2         Page 3           11111111         11111111         11111111 |

**Erase Block** 

Program Page 0

Previous contents of Page 1, 2, 3 are all gone! They must be moved before the block erase operation.

# Flash Performance And Reliability

#### Wear out

- Erased/program slowly accrues a little bit of extra charge.
- As that extra charge builds up, the block becomes unusable.
- MLC block: 10,000 P/E (Program/Erase) cycle lifetime;
- SLC block: 100,000 P/E cycles.

#### Disturbance

- When accessing a particular page within a flash, it is possible that some bits get flipped in neighboring pages
- read disturbs or program disturbs









cell-to-cell interference

### From Raw Flash to Flash-Based SSDs

- Flash-based SSD provides standard block interface atop the raw flash chips inside it.
  - Sector (512 KB) read/write



Dongkun Shin, SKKU  $\Gamma$ 

# **Samsung Galaxy Note8**



### **Solid-State Disk**





INTEL COULD STATE DUTUE

**SATA SSD** 



M.2 NVMe

**NVMe SSD** 



**MacBook Air** 





### FTL

- FTL takes read and write requests on logical blocks and turns them into low-level read, erase, and program commands on the underlying physical blocks and physical pages
- Performance issues
  - Utilizing multiple flash chips in parallel
  - Reduce write amplification
    - total write issued to flash chips/total write issued by host
- Reliability issues
  - Wear leveling
    - spread writes across the blocks of the flash as evenly as possible
    - ensure that all of the blocks of the device wear out at roughly the same time
  - Program disturbance
    - sequential-programming
      - program pages within an erased block in order
      - low page → high page

## **FTL: Direct Mapped**

### A write to logical page N

- Read in the entire block that page N is contained within
- Erase the block
- Program the old pages as well as the new one.
- Read-modify-write approach
- Poor performance & high write amplification
- The physical blocks containing popular data will quickly wear out.

# A Log-Structured FTL

- Upon a write to logical block N
  - The device appends the write to the next free spot in the currently-being-written-to block
  - no overwrite, out-of-place update
  - To allow for subsequent reads of block N, the device keeps a mapping table, which stores the physical address of each logical block.
- Problems: GC & mapping table



What happens if the device loses power?

## **Garbage Collection**

- dead-block reclamation to provide free space for new writes
  - find a block that contains one or more garbage pages
  - read in the live (non-garbage) pages from that block, write out those live pages to the log
  - (finally) reclaim the entire block for use in writing
- Must know
  - Whether each page is live or dead → mapping table or bitmap
  - The number of live pages in each block for victim selection
- Overprovision can delay GC, and push to the background



## **Mapping Table Size**

- Page-level mapping
  - With a large 1-TB SSD, a single 4-byte entry per 4-KB page results in 1 GB of memory needed the device



# **Mapping Table Size**

Block-level mapping



# **Update at Block Mapping**



# **Hybrid Mapping**

- To enable flexible writing but also reduce mapping costs
- FTL keeps a few log blocks and directs all writes to them
  - Page-level mapping (log table)
  - Small number of log blocks → small-sized log table
- Normal data blocks
  - Block-level mapping (data table) → small-sized data table
- For read request
  - first consult the log table; if not found, consult the data table
- To keep the number of log blocks small, log blocks must be merged with data blocks if no free space is available in log blocks.

# **Hybrid Mapping (BAST)**



# **Hybrid Mapping – Merge Operation**



29

### **DFTL**



### **SSD Performance**

- Random I/O
  - Large difference between SSD and HDD
- Sequential I/O
  - Much less of a difference between SSD and HDD
  - HDD still a good choice
- SSD random read performance is not as good as SSD random write performance
  - Write buffer
- Performance difference between sequential and random in SSD
  - many of the file system techniques for HDDs are still applicable to SSDs

|                          | Random |        | Sequential |        |
|--------------------------|--------|--------|------------|--------|
|                          | Reads  | Writes | Reads      | Writes |
| Device                   | (MB/s) | (MB/s) | (MB/s)     | (MB/s) |
| Samsung 840 Pro SSD      | 103    | 287    | 421        | 384    |
| Seagate 600 SSD          | 84     | 252    | 424        | 374    |
| Intel SSD 335 SSD        | 39     | 222    | 344        | 354    |
| Seagate Savvio 15K.3 HDD | 2      | 2      | 223        | 223    |